Cross-Domain Bootstrapping for Named Entity Recognition

نویسندگان

  • Ang Sun
  • Ralph Grishman
چکیده

We propose a general cross-domain bootstrapping algorithm for domain adaptation in the task of named entity recognition. We first generalize the lexical features of the source domain model with word clusters generated from a joint corpus. We then select target domain instances based on multiple criteria during the bootstrapping process. Without using annotated data from the target domain and without explicitly encoding any target-domainspecific knowledge, we were able to improve the source model’s F-measure by 7 points on the target domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrated Machine Learning Techniques for Arabic Named Entity Recognition

Named Entity Recognition (NER) task has become essential to improve the performance of many NLP tasks. Its aim is to endeavor a solution to boost accurately the identification of extracted named entities. This paper presents a novel solution for Arabic Named Entity Recognition (ANER) problem. The solution is an integration approach between two machine learning techniques, namely bootstrapping s...

متن کامل

Domain adaptive bootstrapping for named entity recognition

Bootstrapping is the process of improving the performance of a trained classifier by iteratively adding data that is labeled by the classifier itself to the training set, and retraining the classifier. It is often used in situations where labeled training data is scarce but unlabeled data is abundant. In this paper, we consider the problem of domain adaptation: the situation where training data...

متن کامل

Optimising Selective Sampling for Bootstrapping Named Entity Recognition

Training a statistical named entity recognition system in a new domain requires costly manual annotation of large quantities of in-domain data. Active learning promises to reduce the annotation cost by selecting only highly informative data points. This paper is concerned with a real active learning experiment to bootstrap a named entity recognition system for a new domain of radio astronomical...

متن کامل

Gazetteer Preparation for Named Entity Recognition in Indian Languages

This paper describes our approaches for the preparation of gazetteers for named entity recognition (NER) in Indian languages. We have described two methodologies for the preparation of gazetteers1. Since the relevant gazetteer lists are more easily available in English we have used a transliteration based approach to convert available English name lists to Indian languages. The second approach ...

متن کامل

Bootstrapping Named Entity Recognition with Automatically Generated Gazetteer Lists

Current Named Entity Recognition systems suffer from the lack of hand-tagged data as well as degradation when moving to other domain. This paper explores two aspects: the automatic generation of gazetteer lists from unlabeled data; and the building of a Named Entity Recognition system with labeled and unlabeled data.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011